A chromosome-level genome assembly of the pig-nosed turtle (Carettochelys insculpta)

The pig-nosed turtle (Carettochelys insculpta) represents the only extant species within the Carettochelyidae family, is a unique Trionychia member fully adapted to aquatic life and currently facing endangerment. To enhance our understanding of this species and contribute to its conservation efforts, we employed high-fidelity (HiFi) and Hi-C sequencing technology to generate its genome assembly at the chromosome level. The assembly result spans 2.18 Gb, with a contig N50 of 126 Mb, encompassing 34 chromosomes that account for 99.6% of the genome. The assembly has a BUSCO score above 95% with different databases and strong collinearity with Yangtze giant softshell turtles (Rafetus swinhoei), indicating its completeness and continuity. A total of 19,175 genes and 46.86% repetitive sequences were annotated. The availability of this chromosome-scale genome represents a valuable resource for the pig-nosed turtle, providing insights into its aquatic adaptation and serving as a foundation for future turtle research.


Background & Summary
The pig-nosed turtle is a remarkable and unique organism within the world of Chelonians, standing as the only extant species within the genus Carettochelys 1 .Known for its distinct pig-like nose, paddle-shaped fore flippers adorned with claws, and a shell that can reach up to 50 cm in length in females, this species has evolved a fascinating suite of phenotype that set it apart from other turtles 1 .The closest living relative of the softshell turtle, pig-nosed turtle thrives in habitats such as large rivers, swamps, lagoons, and freshwater environments found in southern Lrian Jaya (Indonesia), southern Papua New Guinea, and the major river systems of the northwestern Northern Territory in Australia 2,3 .
Unfortunately, despite its evolutionary and ecological significance, the pig-nosed turtle is facing a high risk of extinction due to a combination of habitat loss, overexploitation for food and the pet trade, and other anthropogenic pressures 4 .These threats have caused a precipitous decline in its wild population, leading to its classification as "Insufficiently Known" in the 1982 Red Data Book 5 and as "Vulnerable" in the 1996 Red List 6 .These conservation challenges highlight the urgent need for comprehensive molecular data to inform effective protection strategies.
Despite extensive research has been conducted on turtle sex determination and other traits 7,8 , the research for pig-nosed turtle, as a unique Trionychia member fully adapted to aquatic life, has not yet been systematically analyzed due to the lack of a high-quality genome data.In this study, we present a chromosome-level draft genome of the pig-nosed turtle (~2.18 Gb) with a contig N50 of 126 Mb achieved through advanced PacBio high-fidelity (HiFi) sequencing and chromosome conformation capture (Hi-C) sequencing techniques.Conserved core genes (BUSCO score) and genome synteny confirmed the continuity and accuracy of the assembly, representing the highest quality Testudines genome assembled to date.Overall, the high-quality genome serves as a valuable resource for future research on chelonian evolution and conservation.

Methods
Sample collection and sequencing.An artificially bred pig-nosed turtle was obtained from an aquarium in Xiong County, Hebei Province, China.The turtle was anesthetized, and its abdominal cavity was exposed to collect tissue samples, including liver, muscle, kidney, spleen, trachea, and lungs.These samples were promptly frozen in liquid nitrogen and stored at −80 °C.For genomic DNA extraction, muscle tissues were used for both short-reads sequencing and HiFi sequencing, while liver was used for Hi-C sequencing.Additionally, various tissues including muscle, liver, kidney, spleen, lung, and trachea were used for RNA-Seq to obtain a comprehensive annotation of protein-coding genes.Animal care and experimental protocols were approved by the Northwestern Polytechnic University Ethics Committee Institutional Review Board (approval number 202301024).
The Illumina sequencing library was generated using the NEB Next ® Ultra ™ DNA Library Prep Kit (NEB, USA) and the sequencing was performed on the HiSeq 2000 platform.Raw sequencing reads were filtered for adapter sequences, low-quality reads, and trimmed using Fastp v0.20 9 with default parameters and yielding a total of 189.18 Gb of clean short reads.To generate HiFi reads, we followed PacBio's standard protocol (Pacific Biosciences, CA, USA), which yielded 74.23 Gb cleaned long reads with the PacBio Sequel II platform.For Hi-C library construction, we followed the standard protocol described in a published study 10 .The HiSeq X Ten platform was utilized for sequencing, producing a total of 234 Gb of cleaned reads with Fastp v0.20 9 .
To obtain RNA sequences from multiple tissues, the TRIzol kit (TIANGEN, Cat # DP424, China) was used to extract RNA.After constructing libraries for each sample, the Illumina NovaSeq 6000 platform was used for sequencing, Fastp v0.20 9 was used to clean the raw data and each sample obtains more than 6 G of data.

Genome annotation.
A comprehensive genome annotation was performed for pig-nosed turtle, focusing on repetitive sequences, protein-coding genes, and functional predictions.For repetitive sequence annotation, approximately 1 Gb of repetitive regions in the assembled sequence were detected by de novo annotation and homology annotation.For de novo annotation, Tandem Repeats Finder (TRF) v4.0.9 30 Busco score Fig. 2 Quality statistics of genomes of the representative turtles.(a-c) The x-axis indicates the contig N50 of the genome assembly, while the y-axis indicates the genome BUSCO scores based on the tetrapoda_odb10, sauropsida_odb10 and vertebrata_odb10 lineage databases, respectively.
For protein-coding gene prediction, we employed a combination of three methods: de novo prediction, homology-based prediction, and transcript-based prediction.For de novo prediction, we used Augustus v2.5.5 33 .For homology-based prediction, we downloaded protein sets of closely related species mentioned above.Gene structures were predicted based on these homology proteins applied by miniport v0.12-r237 34 .Additionally, we obtained protein sets of the pig-nosed turtle using TransDecoder v5.5.0 based on RNA-Seq.The detailed steps include that SPAdes v3.1.1 35 was performed transcriptome assembly and then TransDecoder v5.5.0 (https:// github.com/TransDecoder/)used to predict protein structure.We aligned it to pig-nosed turtle genome using BLAST v2.6.0 36 .Gene structures were predicted using GeneWise v2.2.0 37 .The results of the three prediction methods were integrated into a final gene set using EVidenceModeler v1.1.1 38 .A total of 19,175 protein-coding genes were annotated in the pig-nosed turtle genome assembly, with a BUSCO completeness of 97.6% using tetrapoda_odb10 lineage database.

technical Validation
The integrity of the extracted DNA was checked by agarose gel electrophoresis, and the concentration of DNA was determined using Qubit fluorometer using the 1 × dsDNA HS kit.Contig N50 (126.6 Mb) and BUSCO score (97.60%) are much higher quality than genomes of other turtles (Fig. 2).Higer anchoring rate (99.6%) and a strong linear relationship between the genome assembly of pig-nosed turtle and Yangtze giant softshell turtle validating that we obtained a contiguous pig-nosed turtle genome.

Fig. 1
Fig. 1 Genome assembly of pig-nosed turtle.(a) Different k-mer (k = 21, 23, 25, 27) distribution of the pignosed turtle genome.Genome size inferred by k num divided by k depth.The range of estimated genome size from 2.15 to 2.17 Gb for pig-nosed turtle.(b) Hi-C linkage density heat map of the pig-nosed turtle.The x-axis and y-axis represent genomic positions.Red dots indicate regions with a high density of paired reads, suggesting that they are more likely to be on the same chromosome.

Fig. 3
Fig. 3 Overview of the pig-nosed turtle genome.(a) Synteny alignment in pig-nosed turtles and R. swinhoei.The blue circle represents chromosomes of pig-nosed turtle, while the grey circle represents chromosomes of R. swinhoei.(b) The densities of protein coding genes and different types of repeat sequences density and GC content are shown in the inner rims with a window size of 10 Mb.The numbers 1 to 34 correspond to the chromosomes of the pig-nosed turtle.

Table 1 .
Statistics of the genome assembly.

Table 2 .
Statistics of the genome assembly on the presence of conserved BUSCO orthologs.generate soft-masked sequences.The repetitive sequences account for 46.86% of the total genome length and include 310 Mb of DNA elements (14.23% of total length), 346.89Mb of long interspersed nuclear elements (LINEs) (15.91% of total length), 25.38 Mb of short interspersed nuclear elements (SINEs) (1.16% of total length), and 121.3 Mb of long terminal repeats (LTRs) (5.56% of total length) (Fig.

Table 3 .
Statistics of the functional annotation of protein-coding genes.